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INTRODUCTION 

This  research  project  is  concerned  with  a primary  goal  of 
analyzing  scenes  in  terms  of  three-dimensional  object  descriptions. 
Stereo  pairs  of  images  are  to  be  used  to  determine  x,  y and  z 
coordinates  of  surfaces  within  the  scene.  The  three-dimensional 
surfaces  are  then  to  be  analyzed  as  solid  objects. 

The  research  thrust  so  far  has  been  primarily  in  the  area  of 
methods  of  determination  of  the  x,  y and  z coordinates  from  two 
stereo  images.  This  report  contains  a brief  review  of  some 
previous  work  in  computer  depth  determination  using  stereopsis, 
a review  of  this  project’s  efforts  currently,  and  an  outline  of 
problem  areas  to  be  investigated. 

It  should  be  noted  that  the  amount  of  work  done  by  other 
researchers  in  the  fields  of  image  processing,  image  coding, 
image  registration  and  scene  analysis  is  tremendous.  Much  of 
this  project's  effort  so  far  has  been  in  reviewing  other  research 
and  in  analyzing  the  problem.  As  with  most  research  projects, 
once  the  broad  base  of  knowledge  is  established  a great  deal  of 
progress  can  be  made.  The  next  six  months  of  this  project  should 
be  highly  productive  in  terms  of  solutions  to  the  problems 
involved  in  computer  binocular  vision. 


I.  Review 


The  basic  procedure  involved  in  any  stereopsis  situation  is 
to  locate  corresponding  points  within  the  two  images  and  through 
reasonably  straightforward  trigonometry  determine  the  x,  y and  z 
locations  of  the  point. 
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Quite  obviously,  this  procedure  cannot  be  performed  for 
every  point  within  the  scene.  Locating  a point  in  the  second 
image  which  corresponds  to  a point  in  the  first  image  is  only 
possible  when  some  structure  exists  around  the  area  of  the  point 
in  question.  Uniformly  shaded  areas  within  images  typically 
represent  flat  surfaces,  or  curved  surfaces  containing  no  sharp 
irregularities.  A point  located  within  such  an  area  cannot  be 
distinguished  from  other  points  within  the  area  — no  match 
is  possible. 

A procedure  for  depth  determination,  then,  usually  starts  by 
finding  a subarea  about  a point  in  one  image  which  has  a high 
likelihood  of  being  matched  with  a subarea  in  the  second  image. 
Subareas  with  a variance  exceeding  some  threshold  may  be  used  as 
a target  for  matching.  This  measure  can  lead  to  false  expecta- 
tions and  consequent  wasted  effort  when  the  image  contains  regions 
that  are  textured  (hence,  with  high  subarea  variance)  but  of 
uniform  texture.  Another  strategy  involves  finding  edges 
(directed  variance)  in  the  image  and  using  those  subareas  contain- 
ing edges  as  targets  for  matching.  Care  must  be  taken,  however, 
to  use  subareas  which  contain  more  than  one  edge  (hopefully  with 
the  edges  not  oriented  in  the  same  direction) . This  is  necessary 
to  avoid  trying  to  match  a target  with  the  multitude  of  subareas 
within  the  second  image  which  may  lie  upon  a common  edge  (see 
Fig.  1).  Feature  selection,  then,  is  an  important  first  step  in 
the  matching  process.  Selection  of  edge  intersections  (corners) 
as  the  location  of  target  areas  can  eliminate  much  wasted  effort. 


M/v*VcKe 


Pig.  1.  Target  chosen  showing  several  possible 
matches  within  second  image. 

Normalized  cross-correlation  is  probably  the  most  widely 
used  measure  of  match  between  sub-areas  contained  in  two  images. 
Rather  than  attempt  to  match  a target  area  with  subareas  centered 
about  every  point  in  the  second  image,  various  heuristics  are 
used  to  narrow  the  search  considerably.  A correlation  may  be 
attempted  on  a coarse  grid  of  points  in  the  second  image.  Those 
attempts  which  show  promise  (correlation  measure  above  some 
threshold)  are  then  subjected  to  a hill-climbing  procedure  to 
determine  local  maxima.  The  largest  of  these  local  maxima  is 
then  selected  as  a match.  Another  heuristic  of  value  lies  in 
assuming  that  once  a match  for  a target  has  been  found,  matches 
for  neighboring  targets  may  be  found  in  the  neighborhood  of  the 
first  match. 

The  process  of  determining  a measure  of  match  at  a particular 
point  may  be  aided  in  several  ways.  One  involves  using  a threshold, 
not  for  the  entire  subarea  calculation,  but  as  a running  threshold 
throughout  the  process  of  correlation  over  the  area.  The  assump- 
tion in  this  case  is  that  a true  match  will  be  indicated  by  a 
large  majority  of  elements  within  a subarea  matching  corresponding 
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elements  within  the  second  subarea.  If  several  elements  (chosen 
randomly  or  in  a specific  sampling  sequence)  differ  significantly 
from  their  corresponding  elements  a match  is  unlikely;  the  pro- 
cedure should  be  terminated  and  re-initiated  at  a different  location. 

A second  method  of  speeding  correlation  lies  in  speeding 
computation.  The  use  of  the  Fast  Fourier  Transform  (FFT)  to 


perform  the  cross-correlation  can  be  beneficial. 


II.  Concept  Development 

Image  analysis  is  often  very  costly  both  in  terms  of  time 
required  for  computation  and  in  amount  of  needed  storage.  Disk 
and  tape  storage  are  generally  used  for  retaining  images,  with 
consequent  limitations  on  access  speed  and  convenience.  Anything 
approaching  real  time  image  processing  is  only  achieved  on  dedi- 
cated hardware  systems  or  on  very  large  expensive  dedicated  com- 
puter systems  (ILLIAC,  etc.). 

Researchers  interested  in  image  coding  and  transmission  have 
developed  a number  of  methods  for  reducing  the  amount  of  data 
required  to  describe  an  image.  The  goal  of  those  researchers 
has  generally  been  the  presentation  of  an  image  which  is  accepta- 
ble to  a human  viewer.  The  structure  which  humans  perceive  in 
images  (and  by  which  they  Judge  acceptability)  is  precisely  the 
structure  often  needed  in  artificial  vision-image  analysis.  It 
is  reasonable  then  to  look  to  the  field  of  image  coding  for  methods 
of  data  compaction  which  may  be  of  assistance  in  image  analysis. 

Fourier  transform  coding  of  images  has  been  used  in  image 


analysis  for  some  time.  Features  may  be  extracted  from  the 
Fourier  transform  plane  for  use  in  pattern  recognition  systems. 
However,  analysis  techniques  such  as  contour  following,  region 
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growing,  corner  recognition,  etc.  are  not  at  all  amenable  to 
operations  within  the  Fourier  transform  domain.  In  addition,  a 
heavy  computational  penalty  may  be  incurred  because  of  the 
necessity  for  complex  arithmetic,  even  though  FFT  techniques 
alleviate  part  of  the  burden. 

There  are  other  transforms  used  for  image  coding  which  are 
not  amenable  to  analysis  techniques  applied  to  scenes  of  solid 
objects.  One  which  may  be  useful,  however,  is  the  Walsh/Hada- 
mard  transform.  Some  work  in  pattern  recognition  has  involved 
feature  extraction  in  the  Hadamard  domain.  This  research  project 
is  currently  investigating  methods  of  identification  in  the 
Hadamard  transfer  domain  of  areas  which  are  suitable  for  use  as 
targets  in  the  image  matching  process.  The  Hadamard  transform 
is  particularly  attractive  in  terms  of  computational  ease,  since 
it  requires  no  multiplication,  only  real  additions,  and  a fast 
Hadamard  algorithm  exists. 

This  project  is  also  currently  evaluating  the  use  of  the 
Hadamard  transform  in  the  matching  process  itself.  Multiplying 
the  Hadamard  transform  of  a target  subarea  by  the  transform  of  a 
subarea  in  the  second  image  yields  the  transform  of  the  logical 
cross-correlation  of  the  two  areas.  This  cross-correlation  will 
be  used  as  a measure  of  match  with  large  computational  savings 
over  FFT  or  lagged-product  procedures. 

Another  technique  used  in  image  coding  which  is  under  invest i 
gation  in  this  project  is  that  of  run-length  coding.  Simply 
described,  the  grey  value  of  a run  (horizontal  or  vertical)  of 
similar  pixels  is  coded  along  with  the  length  of  the  run.  Thus, 
two  words  take  the  place  of  many  words  representing  individual 
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pixels.  Since  the  termination  of  a run  depends  upon  the  difference 
between  two  elements  exceeding  some  threshold,  there  is  an  added 
benefit  in  terms  of  image  analysis,  that  of  edge  detection.  The 
location  of  a run  length  start  corresponds  to  an  abrupt  change  in 
image  characteristics  — precisely  the  measure  used  to  define  an 
edge.  This  gain  in  information  with  a decrease  in  storage  space 
has  evidently  not  been  explored  in  other  research  in  image  analysis. 
This  project  is  engaged  in  examining  efficient  analysis  techniques 
in  a run-length  coded  data  base.  Contour  following  and  region 
growing  appear  particularly  simple  in  this  instance.  Data  base 
organization  for  horizontal  and  vertical  run  length  coding  is 
being  explored.  The  use  of  run-lengths  in  addition  to  gray  values 
as  a measure  of  texture  also  will  be  investigated  in  the  future. 

The  run  length  encoding  process  is  particularly  amenable  to  fairly 
simple  hardware  implementation. 

Currently  under  investigation  are  methods  of  increasing 
matching  efficiency  in  the  stereopsis  system  by  using  run-length 
encoded  data.  The  correlation  of  encoded  data  can  be  performed 
very  efficiently.  A new  measure  of  match  is  also  provided  by 
correlating  the  run  lengths  themselves.  This  is  a measure  of 
structural  match  independent  of  shading. 

In  addition  to  performing  image  analysis  within  the  run- 
length  encoded  data,  it  is  useful,  as  mentioned  before,  to  analyze 
the  Fourier,  Kadamard  or  other  transform  of  the  image.  This 
research  project  is  developing  algorithms  for  obtaining  Hadamard 
and  Fourier  transforms  from  the  run-length  encoded  data.  The 
results  should  be  very  efficient  computationally. 


7 


III.  Hardware  and  Software  Development 

A two-camera  television  interface  with  256x256  pixels  per 
frame  is  being  designed  and  will  shortly  be  constructed  in  the 
Signal  Processing  Laboratory.  A fast  buffer  memory  system  has 
been  designed  so  that  a 1024  pixel  window  from  a TV  frame  can  be 
acquired  during  one  frame  interval  (1/60  second).  A Direct  Memory 
Access  (DMA)  system  is  being  added  to  the  Adage  AGT-30  within  the 
laboratory.  The  fast  TV  buffer  will  be  tied  to  this  DMA  to  enable 
very  fast  access  to  1024  pixels  in  either  TV  image.  The  location 
of  the  window  is  specified  (under  program  control)  as  centered 
around  ary  arbitrary  pixel  location.  To  provide  flexibility,  the 
addressing  system  for  the  buffer  memory  is  designed  so  that  the 
window  may  be  configured  as  any  rectangular  shape,  from  4x256 
pixels  to  256x4  pixels. 

The  input  to  the  buffer  system  may  be  either  direct  TV  data 
(8  bits  of  grey  level)  or  from  a run  length  encoder  which  stores 
grey  level  and  run  length  in  consecutive  locations. 

Some  basic  software  for  image  analysis  has  been  developed  so 
far.  Until  the  television  interface  is  completed,  input  is  from 
magnetic  tapes  of  images  (obtained  from  the  Image  Processing 
Institute  of  the  University  of  Southern  California).  Routines  to 
evaluate  the  effectiveness  of  different  criteria  for  run-length 
encoding  have  been  developed.  Various  gradient  and  thresholding 
measures  have  been  implemented.  The  image  derivative  can  be  taken 
with  direction  indicated  by  color  in  the  displayed  result.  Software 
for  x-direction  run-length  encoding  has  been  developed,  as  well  as  a 
statistical  package  for  evaluating  storage  reductions  achieved  by 
various  methods  of  encoding.  Current  work  is  on  developing  two- 
axis  coding,  and  on  correlation  of  run-length  coded  data. 
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FUTURE  WORK 


Briefly,  the  project  objectives  for  the  future  are  as 
follows : 

(1)  Develop  2-axis  run-length  coding  techniques  (area  coding). 

(2)  Develop  a fast  Hadamard  transform  using  encoded  data.  Like- 
wise for  Fourier  transform. 

(3)  Use  FHT  to  perform  subarea  correlation. 

(4)  Develop  feature  extraction  system  using  Hadamard  transform. 

(5)  Develop  routines  for  obtaining  3-D  data  once  subarea  matches 
are  known. 

(6)  Develop  surface  growing  techniques  for  3-D  data. 

(7)  Develop  object  extraction  and  analysis  system. 
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